13 research outputs found
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
Prediction-Powered Inference
We introduce prediction-powered inference \unicode{x2013} a framework for
performing valid statistical inference when an experimental data set is
supplemented with predictions from a machine-learning system. Our framework
yields provably valid conclusions without making any assumptions on the
machine-learning algorithm that supplies the predictions. Higher accuracy of
the predictions translates to smaller confidence intervals, permitting more
powerful inference. Prediction-powered inference yields simple algorithms for
computing valid confidence intervals for statistical objects such as means,
quantiles, and linear and logistic regression coefficients. We demonstrate the
benefits of prediction-powered inference with data sets from proteomics,
genomics, electronic voting, remote sensing, census analysis, and ecology.Comment: Code is available at
https://github.com/aangelopoulos/prediction-powered-inferenc
Augmenting biologging with supervised machine learning to study in situ behavior of the medusa Chrysaora fuscescens
© The Author(s), 2019. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Fannjiang, C., Mooney, T. A., Cones, S., Mann, D., Shorter, K. A., & Katija, K. Augmenting biologging with supervised machine learning to study in situ behavior of the medusa Chrysaora fuscescens. Journal of Experimental Biology, 222, (2019): jeb.207654, doi:10.1242/jeb.207654.Zooplankton play critical roles in marine ecosystems, yet their fine-scale behavior remains poorly understood because of the difficulty in studying individuals in situ. Here, we combine biologging with supervised machine learning (ML) to propose a pipeline for studying in situ behavior of larger zooplankton such as jellyfish. We deployed the ITAG, a biologging package with high-resolution motion sensors designed for soft-bodied invertebrates, on eight Chrysaora fuscescens in Monterey Bay, using the tether method for retrieval. By analyzing simultaneous video footage of the tagged jellyfish, we developed ML methods to: (1) identify periods of tag data corrupted by the tether method, which may have compromised prior research findings, and (2) classify jellyfish behaviors. Our tools yield characterizations of fine-scale jellyfish activity and orientation over long durations, and we conclude that it is essential to develop behavioral classifiers on in situ rather than laboratory data.This work was supported by the David and Lucile Packard Foundation (to K.K.), the Woods Hole Oceanographic Institution (WHOI) Green Innovation Award (to T.A.M., K.K. and K.A.S.) and National Science Foundation (NSF) DBI collaborative awards (1455593 to T.A.M. and K.A.S.; 1455501 to K.K.). Deposited in PMC for immediate release
Optimal arrays for compressed sensing in snapshot-mode radio interferometry
Context. Radio interferometry has always faced the problem of incomplete
sampling of the Fourier plane. A possible remedy can be found in the promising new theory
of compressed sensing (CS), which allows for the accurate recovery of sparse signals from
sub-Nyquist sampling given certain measurement conditions.
Aims. We provide an introductory assessment of optimal arrays for CS in
snapshot-mode radio interferometry, using orthogonal matching pursuit (OMP), a widely used
CS recovery algorithm similar in some respects to CLEAN. We focus on comparing centrally
condensed (specifically, Gaussian) arrays to uniform arrays, and randomized arrays to
deterministic arrays such as the VLA.
Methods. The theory of CS is grounded in a) sparse
representation of signals and b) measurement matrices of low coherence.
We calculate the mutual coherence of measurement matrices as a theoretical indicator of
arrays’ suitability for OMP, based on the recovery error bounds in Donoho et al. (2006,
IEEE Trans. Inform. Theory, 52, 1289). OMP reconstructions of both point and extended
objects are also run from simulated incomplete data. Optimal arrays are considered for
objects represented in 1) the natural pixel basis and 2) the block discrete cosine
transform (BDCT).
Results. We find that reconstructions of the pixel representation
perform best with the uniform random array, while reconstructions of the BDCT
representation perform best with normal random arrays. Slight randomization to the VLA
also improves it dramatically for CS recovery with the pixel basis.
Conclusions. In the pixel basis, array design for CS reflects known
principles of array design for small numbers of antennas, namely of randomness and uniform
distribution. Differing results with the BDCT, however, emphasize the need to study how
sparsifying bases affect array design before CS can be optimized for radio
interferometry
Recommended from our members
Toward Trustworthy Scientific Inquiry and Design with Machine Learning
The last decade has witnessed rapid development and deployment of machine-learning systems across science. Such systems can supply predictions about scientific phenomena far more quickly and cheaply than gold-standard experiments, and are being used in efforts to both discover scientific knowledge and design new biomolecules. However, an important question remains unanswered: since machine-learning systems make errors, how can we use them in a trustworthy way for scientific discovery and design? This dissertation takes steps toward helping to ensure that the biomolecules we design and the scientific conclusions we draw using machine learning can be trusted.We begin in the setting of machine learning-based design. The goal in this setting is to propose novel objects such as proteins, small molecules, or materials with desired properties, in a way that is guided by machine-learning models of such properties. Toward addressing model trustworthiness for design, we propose (i) a method for learning models that accounts for the distribution shifts inherent to design, and (ii) a method for constructing statistically valid confidence sets for the properties of objects designed using machine learning.Finally, we examine the trustworthy use of machine learning for drawing scientific conclusions. In particular, we consider the increasingly relevant setting of treating predictions made by machine-learning systems as “data” in estimating quantities of scientific interest. We propose prediction-powered inference, a novel statistical framework for constructing valid confidence sets in this setting, which enables researchers to incorporate evidence from machine-learning systems into their scientific inquiry in a standardized and principled way
Area-only method for underwater object tracking using autonomous vehicles
OCEANS 2019, 17-20 June 2019, Marseille.-- 9 pages, 6 figures, 1 tableThe use of autonomous underwater vehicles for ocean research has increased as they have a better cost/performance ratio than crewed oceanographic vessels. For example, autonomous vehicles (e.g. a Wave Glider) can be used to localise and track underwater targets. Whereas other researchers have been focused on target tracking using acoustic modems, here we present a novelty method called area-only target tracking. This method works with commercially available acoustic tags, thereby reducing the costs and complexity over other tracking systems. Moreover, this method can be used to track small targets such as jellyfishes due to the tag's size. The methodology behind the area-only technique is shown, and results from the first field tests conducted in Monterey Bay area are also presented